Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix more instances of schema missing metadata #13068

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

itsjunetime
Copy link
Contributor

Which issue does this PR close?

This will not close, but rather just helps with the situation in, #12733.

Rationale for this change

As I was trying to fix other issues, I ran into more instances where Schema metadata could theoretically be dropped. These specific issues don't have reproducers (since I just found them while working on other stuff), but I think they are clearly issues nonetheless.

What changes are included in this PR?

Mostly changing Schema::new to Schema::new_with_metadata and schema_builder.finish() to schema_builder.finish().with_metadata(...)

Are these changes tested?

Yes, all tests still pass.

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Oct 22, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @itsjunetime -- I agree these all looks like good fixes. I am a little worried about the lack of test coverage but I agree most of the situations look like they would be hard to write tests for.

I had some thoughts in apache/arrow-rs#6576 about how to make the API for building Schema harder to avoid missing metadata -- maybe we can use this PR as an use case for that API

@@ -1014,14 +1014,21 @@ impl DefaultPhysicalPlanner {
})
.collect();

let metadata: HashMap<_, _> = left_df_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one we should be able to write a test for -- perhaps with a query with a JOIN in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/metadata.slt

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 7f32dce into apache:main Oct 25, 2024
24 checks passed
@alamb alamb deleted the june/fix_more_missing_schema_metadata branch October 25, 2024 13:19
@alamb
Copy link
Contributor

alamb commented Oct 25, 2024

It would be great to figure out how to test this code better but maybe that is not feasible at this time -- thanks again @itsjunetime and @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants